Search for Visual News: Benchmark and Challenges in News Image Captioning

2024-11-08 03:14:380 次浏览投稿

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

2 code implementations•20 Dec 2021

Specifically, the task involves multi-hop questions that require reasoning over image-caption pairs to identify the grounded visual object being referred to and then predicting a span from the news body text to answer the question.

Answer GenerationData Augmentation+2

云奕文章网

相关推荐：